Amazon: Titan Text Express

This is the transparency report for Amazon for the Titan Text Express model. To see their responses for each indicator, click through the various domains and subdomains. For further information, visit the website for the May 2024 Foundation Model Transparency Index.

Data size (Score: 0)

For the data used in building the model, is the data size disclosed?

Disclosure: Not disclosed

Note: Data size should be reported in appropriate units (e.g. bytes, words, tokens, images, frames) and broken down by modality. Data size should be reported to a precision of one significant figure (e.g. 4 trillion tokens, 200 thousand images). No form of decomposition into data phases is required.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Data sources (Score: 0)

For all data used in building the model, are the data sources disclosed?

Disclosure: Not disclosed

Note: To receive this point, a meaningful decomposition of sources must be listed in an understandable way (e.g. named URLs/domains/databases/data providers). It does not suffice to say data is “sourced from the Internet" or comes from "licensed sources”.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Data creators (Score: 0)

For all data used in building the model, is there some characterization of the people who created the data?

Disclosure: Not disclosed

Note: While information about data creators may not be easily discernible for some data scraped from the web, the general sources (URLs/domains) should be listed, and, for other data that is bought, licensed, or collected, a reasonable attempt at characterizing the underlying people who provided the data is required to receive this point. The relevant properties of people can vary depending on context: for example, relevant properties could include demographic information like fraction of Black individuals contributing to the dataset, geographic information like fraction of European individuals contributing to the dataset, language information like fraction of L1 English speakers, or occupational information like the fraction of professional artists.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Data source selection (Score: 0)

Are the selection protocols for including and excluding data sources disclosed?

Disclosure: Not disclosed

Note: Selection protocols refer to procedures used to choose which datasets or subsets of datasets will be used to build a model. We will award this point even if the selection protocols are non-exhaustive.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Data curation (Score: 0)

For all data sources, are the curation protocols for those data sources disclosed?

Disclosure: Not disclosed

Note: Curation protocols refer to steps taken to further modify data sources, such as procedures to manage, annotate, and organize data. The aims of curation might include improving the quality, relevance, and representativeness of the data. We will award this point if the developer reports that it does not perform any further curation beyond the data sources.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Data augmentation (Score: 0)

Are any steps the developer takes to augment its data sources disclosed?

Disclosure: Not disclosed

Note: Such steps might include augmenting data sources with synthetic data. We will award this point if the developer reports that it does not take any steps to augment its data.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Harmful data filtration (Score: 0)

If data is filtered to remove harmful content, is there a description of the associated filter?

Disclosure: In the blog post, we stated that: "For our Titan Foundation Models, AWS uses data from the following sources for training: (1) data licensed from third parties; (2) open-source datasets; and (3) publicly-available data where appropriate. Before including datasets in the Titan Foundation Models’ training data, AWS reviews them for possible bias, toxicity, legal, and other quality considerations." In the Titan Text AI service Card, we also mentioned that :"We tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals."

Note: Such harmful content might relate to violence or child sexual abuse material. We will award this point if the developer reports that it does not perform any harmful data filtration.

References: https://aws.amazon.com/uki/cloud-services/uk-gov-ai-safety-summit/; https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: While the disclosure provides useful information, it does not provide a clear description of the filters for harmful data.

New disclosure? No

Copyrighted data (Score: 0)

For all data used in building the model, is the associated copyright status disclosed?

Disclosure: In the blog post, we stated that: "For our Titan Foundation Models, AWS uses data from the following sources for training: (1) data licensed from third parties; (2) open-source datasets; and (3) publicly-available data where appropriate. Before including datasets in the Titan Foundation Models’ training data, AWS reviews them for possible bias, toxicity, legal, and other quality considerations."

Note: To receive this point, the copyright status (e.g. copyrighted, public domain) must relate to some decomposition of the data. We will award this point if there is some meaningful decomposition of the data, even if the decomposition is insufficient to receive the Data Creators point or if the disclosure is not comprehensive relative to legal copyright standards.

References: https://aws.amazon.com/uki/cloud-services/uk-gov-ai-safety-summit/

Justification: While the disclosure provides useful information, it does not provide a clear description of the copyright status for the data used to build the model.

New disclosure? No

Data license (Score: 0)

For all data used in building the model, is the associated license status disclosed?

Disclosure: "For our Titan Foundation Models, AWS uses data from the following sources for training: (1) data licensed from third parties; (2) open-source datasets; and (3) publicly-available data where appropriate. Before including datasets in the Titan Foundation Models’ training data, AWS reviews them for possible bias, toxicity, legal, and other quality considerations. We store training data in secure repositories that are subject to robust access controls consistent with AWS’s state of the art data security policies. We treat training data as confidential AWS information, and apply appropriate security and access controls. In particular, we encrypt all data in transit and data that is persisted at rest."

Note: To receive this point, the license status must relate to some decomposition of the data. We will award this point if there is some meaningful decomposition of the data, even if the decomposition is insufficient to receive the Data Creators point.

References: https://aws.amazon.com/uki/cloud-services/uk-gov-ai-safety-summit/

Justification: While the disclosure provides useful information on licensing, it does not provide a clear description of the license status for the data used to build the model.

New disclosure? No

Personal information in data (Score: 0)

For all data used in building the model, is the inclusion or exclusion of personal information in that data disclosed?

Disclosure: Not disclosed

Note: To receive this point, the disclosure of personal information must relate to some decomposition of the data. We will award this point if there is some meaningful decomposition of the data, even if the decomposition is insufficient to receive the Data Creators point. Additionally, we will award this point if the developer reports the inclusion of personal information, independent of if and how they mitigate related privacy concerns.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Use of human labor (Score: 1)

Are the phases of the data pipeline where human labor is involved disclosed?

Disclosure: Human labor is involved in the SFT annotation, RLHF process, and safety filtering. In the Titan Text AI Service Card, we noted: "Our development process exercises these control levers as follows: 1/ we pre-train the LLM using curated data from a variety of sources, including licensed and proprietary data, open source datasets, and publicly available data where appropriate; 2/ we adjust model weights via supervised fine tuning (SFT) and reinforcement learning with human feedback (RLHF) to increase the alignment between Titan Text LLMs and our design goals; and 3/ we tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals.“

Note: Phases of the data pipeline that involve human labor include activities and tasks performed by people to collect, annotate, clean, or validate data. This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer gives a reasonable best-effort description of the use of human labor in their data pipeline.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Human labor is involved in the SFT annotation, RLHF process, and safety filtering.

New disclosure? No

Employment of data laborers (Score: 0)

Is the organization that directly employs the people involved in data labor disclosed for each phase of the data pipeline?

Disclosure: Not disclosed

Note: Phases of the data pipeline that involve human labor include activities and tasks performed by people to collect, annotate, clean, or validate data. This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer provides the name of the organization that employs data laborers, even if other details about the employment relationship are not disclosed.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Geographic distribution of data laborers (Score: 0)

Is geographic information regarding the people involved in data labor disclosed for each phase of the data pipeline?

Disclosure: Not disclosed

Note: This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer gives a reasonable best-effort description of the geographic distribution of labor at the country-level.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Wages (Score: 0)

Are the wages for people who perform data labor disclosed?

Disclosure: Not disclosed

Note: This indicator is inclusive of data labor at all points of the model development process, such as training data annotation or red teaming data used to control the model. We will award this point if the developer reports that it does not compensate workers. For all data that is created by or on behalf of the developer,

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Instructions for creating data (Score: 0)

Are the instructions given to people who perform data labor disclosed?

Disclosure: Not disclosed

Note: This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer makes a reasonable best-effort attempt to disclose instructions given to people who create data used to build the model for the bulk of the data phases involving human labor.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Labor protections (Score: 0)

Are the labor protections for people who perform data labor disclosed?

Disclosure: Not disclosed

Note: This indicator is inclusive of data labor at all points of the model development process, such as training data annotation or red teaming data used to control the model. It is also inclusive of all data that is created by or on behalf of the developer. As an example, labor protections might include protocols to reduce the harm to workers' mental health stemming from exposure to violent content when annotating training data. We will award this point if the developer reports that it does not protect workers or if it does not use data laborers and therefore has no labor protections.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Third party partners (Score: 0)

Are the third parties who were or are involved in the development of the model disclosed?

Disclosure: Not disclosed

Note: This indicator is inclusive of partnerships that go beyond data labor as there may be third party partners at various stages in the model development process. We will award this point if the developer reports that it was the sole entity involved in the development of the model.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Queryable external data access (Score: 0)

Are external entities provided with queryable access to the data used to build the model?

Disclosure: Not disclosed

Note: We will award this point for any reasonable mechanism for providing access: direct access to the data, an interface to query the data, a developer-mediated access program where developers can inspect requests, etc. Developers may receive this point even if there are rate-limits on the number of queries permitted to an external entity and restrictions on which external entities are given access, insofar as these limits and restrictions are transparent and ensure a reasonable amount of external access. We may accept justifications for prohibiting queries of specific parts of the data.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Direct external data access (Score: 0)

Are external entities provided with direct access to the data used to build the model?

Disclosure: Not disclosed

Note: We will award this point if external entities can directly access the data without any form of gating from the developer. With that said, we may award this point if the developer provides justifications for prohibiting access to specific parts of the data or to unauthorized external entities.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Compute usage (Score: 0)

Is the compute required for building the model disclosed?

Disclosure: Not disclosed

Note: Compute should be reported in appropriate units, which most often will be floating point operations (FLOPS). Compute should be reported to a precision of one significant figure (e.g. 5 x $10^{25}$ FLOPS). We will award this point even if there is no decomposition of the reported compute usage into compute phases, but it should be clear whether the reported compute usage is for a single model run or includes additional runs, or hyperparameter tuning, or training other models like reward models, or other steps in the model development process that necessitate compute expenditure.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Development duration (Score: 0)

Is the amount of time required to build the model disclosed?

Disclosure: Not disclosed

Note: The continuous duration of time required to build the model should be reported in weeks, days, or hours to a precision of one significant figure (e.g. 3 weeks). No form of decomposition into phases of building the model is required for this indicator, but it should be clear what the duration refers to (e.g. training the model, training and subsequent evaluation and red teaming).

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Compute hardware (Score: 0)

For the primary hardware used to build the model, is the amount and type of hardware disclosed?

Disclosure: Not disclosed

Note: In most cases, this indicator will be satisfied by information regarding the number and type of GPUs or TPUs used to train the model. The number of hardware units should be reported to a precision of one significant figure (e.g. 800 NVIDIA H100 GPUs). We will not award this point if (i) the training hardware generally used by the developer is disclosed, but the specific hardware for the given model is not, or (ii) the training hardware is disclosed, but the amount of hardware is not. We will award this point even if information about the interconnects between hardware units is not disclosed.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Hardware owner (Score: 0)

For the primary hardware used in building the model, is the owner of the hardware disclosed?

Disclosure: Not disclosed

Note: For example, the hardware owner may be the model developer in the case of a self-owned cluster, a cloud provider like Microsoft Azure, Google Cloud Platform, or Amazon Web Services, or a national supercomputer. In the event that hardware is owned by multiple sources or is highly decentralized, we will award this point if a developer makes a reasonable effort to describe the distribution of hardware owners.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Energy usage (Score: 0)

Is the amount of energy expended in building the model disclosed?

Disclosure: Not disclosed

Note: Energy usage should be reported in appropriate units, which most often will be megawatt-hours (mWh). Energy usage should be reported to a precision of one significant figure (e.g. 500 mWh). No form of decomposition into compute phases is required, but it should be clear whether the reported energy usage is for a single model run or includes additional runs, or hyperparameter tuning, or training other models like reward models, or other steps in the model development process that necessitate energy usage.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Carbon emissions (Score: 0)

Is the amount of carbon emitted (associated with the energy used) in building the model disclosed?

Disclosure: Not disclosed

Note: Emissions should be reported in appropriate units, which most often will be tons of carbon dioxide emitted (tCO2). Emissions should be reported to a precision of one significant figure (e.g. 500 tCO2). No form of decomposition into compute phases is required, but it should be clear whether the reported emissions is for a single model run or includes additional runs, or hyperparameter tuning, or training other models like reward models, or other steps in the model development process that generate emissions.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Broader environmental impact (Score: 0)

Are any broader environmental impacts from building the model besides carbon emissions disclosed?

Disclosure: Not disclosed

Note: While the most direct environmental impact of building a foundation model is the energy used and, therefore, the potential carbon emissions, there may be other environmental impacts. For example, these may include the use of other resources such as water for cooling data centers or metals for producing specialized hardware. We recognize that there does not exist an authoritative or consensus list of broader environmental factors. For this reason, we will award this point if there is a meaningful, though potentially incomplete, discussion of broader environmental impact.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Model stages (Score: 1)

Are all stages in the model development process disclosed?

Disclosure: In Titan Text AI Service Card, we noted: "Our development process exercises these control levers as follows: 1/ we pre-train the LLM using curated data from a variety of sources, including licensed and proprietary data, open source datasets, and publicly available data where appropriate; 2/ we adjust model weights via supervised fine tuning (SFT) and reinforcement learning with human feedback (RLHF) to increase the alignment between Titan Text LLMs and our design goals; and 3/ we tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals.“

Note: Stages refer to each identifiable step that constitutes a substantive change to the model during the model building process. We recognize that different developers may use different terminology for these stages, or conceptualize the stages differently. We will award this point if there is a clear and complete description of these stages.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Three stages: pretraining, supervised fine-tuning and reinforcement learning with human feedback, safety filtration.

New disclosure? No

Model objectives (Score: 1)

For all stages that are described, is there a clear description of the associated learning objectives or a clear characterization of the nature of this update to the model?

Disclosure: In Titan Text AI Service Card, we noted: "Our development process exercises these control levers as follows: 1/ we pre-train the LLM using curated data from a variety of sources, including licensed and proprietary data, open source datasets, and publicly available data where appropriate; 2/ we adjust model weights via supervised fine tuning (SFT) and reinforcement learning with human feedback (RLHF) to increase the alignment between Titan Text LLMs and our design goals; and 3/ we tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals.“

Note: We recognize that different developers may use different terminology for these stages, or conceptualize the stages differently. We will award this point if there is a clear description of the update to the model related to each stage, whether that is the intent of the stage (e.g. making the model less harmful), a mechanistic characterization (e.g. minimizing a specific loss function), or an empirical assessment (e.g. evaluation results conducted before and after the stage).

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Pretraining objective - next-word prediction. SFT and RLHF objectives to increase the alignment between Titan Text LLMs and developer design goals.

New disclosure? No

Core frameworks (Score: 0)

Are the core frameworks used for model development disclosed?

Disclosure: In Titan Text AI Service Card, we noted: "Titan Text LLMs perform token inference using transformer-based generative machine learning. They work as follows: given a sequence of tokens (the prompt), they predict the next most likely token (first completion token), add the token to the previous input sequence, predict the next token, and keep iterating until some prescribed stopping condition is met (e.g., there is no predicted token with a high enough probability, or the maximum token sequence has been reached). Titan models predict the next token in a token sequence using a probability distribution learned through a combination of unsupervised and supervised machine learning techniques. Our runtime service architecture works as follows: 1/ Titan Text receives a user prompt via the API or Console; 2/ Titan Text filters the prompt to comply with safety, fairness and other design goals; 3/ Titan Text augments the filtered prompt to support user-requested features, e.g., knowledge-base retrieval; 4/ Titan Text generates a completion; 5/ Titan Text filters the completion for safety and other concerns; 6/ Titan Text returns the final completion.“

Note: Examples of core frameworks include Tensorflow, PyTorch, Jax, Hugging Face Transformers, Seqio, T5X, Keras, SciKit, and Triton. If there are significant internal frameworks, there should be some description of their function and/or a reasonably similar publicly-available analogue. We recognize that there does not exist an authoritative or consensus list of core frameworks. For this reason, we will award this point if there is a meaningful, though potentially incomplete, list of major frameworks for the first version of the index.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: While the disclosure provides useful information, it does not provide a clear description of the core frameworks.

New disclosure? No

Additional dependencies (Score: 0)

Are any dependencies required to build the model disclosed besides data, compute, and code?

Disclosure: Not disclosed

Note: For example, if the model depends on an external search engine, programmable APIs, or tools, this should be disclosed. We recognize that there is not widespread consensus regarding what constitutes key dependencies beyond the data, compute, and code. We will award this point only if developers give a reasonable best-effort description of any additional dependencies or make clear that no additional dependencies are required.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Mitigations for privacy (Score: 0)

Are any steps the developer takes to mitigate the presence of PII in the data disclosed?

Disclosure: Not disclosed

Note: Such steps might include identifying personal information in the training data, filtering specific datasets to remove personal information, and reducing the likelihood that models will output personal information. We will award this point if the developer reports that it does not take steps to mitigate the presence of PII in the data.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Mitigations for copyright (Score: 0)

Are any steps the developer takes to mitigate the presence of copyrighted information in the data disclosed?

Disclosure: Not disclosed

Note: Such steps might include identifying copyrighted data, filtering specific datasets to remove copyrighted data, and reducing the likelihood that models will output copyrighted information. We will award this point if the developer reports that it does take steps to mitigate the presence of copyrighted information in the data.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Input modality (Score: 1)

Are the input modalities for the model disclosed?

Disclosure: Text

Note: Input modalities refer to the types or formats of information that the model can accept as input. Examples of input modalities include text, image, audio, video, tables, graphs.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Output modality (Score: 1)

Are the output modalities for the model disclosed?

Disclosure: Text

Note: Output modalities refer to the types or formats of information that the model can accept as output. Examples of output modalities include text, image, audio, video, tables, graphs.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Model components (Score: 1)

Are all components of the model disclosed?

Disclosure: The model is a single-component decoder-only autoregressive Transformer.

Note: Model components refer to distinct and identifiable parts of the model. We recognize that different developers may use different terminology for model components, or conceptualize components differently. Examples include: (i) For a text-to-image model, components could refer to a text encoder and an image encoder, which may have been trained separately. (ii) For a retrieval-augmented model, components could refer to a separate retriever module.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Model size (Score: 0)

For all components of the model, is the associated model size disclosed?

Disclosure: Not disclosed

Note: This information should be reported in appropriate units, which generally is the number of model parameters, broken down by named component. Model size should be reported to a precision of one significant figure (e.g. 500 billion parameters for text encoder, 20 billion parameters for image encoder).

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Model architecture (Score: 1)

Is the model architecture disclosed?

Disclosure: The model is a single-component decoder-only autoregressive Transformer.

Note: Model architecture is the overall structure and organization of a foundation model, which includes the way in which any disclosed components are integrated and how data moves through the model during training or inference. We recognize that different developers may use different terminology for model architecture, or conceptualize the architecture differently. We will award this point for any clear, though potentially incomplete, description of the model architecture.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Centralized model documentation (Score: 1)

Is key information about the model included in a centralized artifact such as a model card?

Disclosure: An AI Service Card is provided.

Note: We recognize that different developers may share this information through different types of documentation, such as a system card or several clearly interrelated documents. We will award this point for the disclosure of any such centralized artifact that provides key information typically included in a model card, though the artifact may be longer-form than a standard model card (e.g. a technical report).

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

External model access protocol (Score: 1)

Is a protocol for granting external entities access to the model disclosed?

Disclosure: Titan Models are distributed via Bedrock platform. In the user guide, customers can find the instructions to gain model access. The Bedrock User Guide provide detailed instrutions on gaining model access from this page: https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html, which describes the access protocal as follows: "Access to Amazon Bedrock foundation models isn't granted by default. In order to gain access to a foundation model, an IAM user with sufficient permissions needs to request access to it through the console. Once access is provided to a model, it is available for all users in the account. The requirements for access are generally the same as those for acquiring an Amazon account.

Note: A model access protocol refers to the steps, requirements, and considerations involved in granting authorized model access to external entities. We will award this point if the developer discloses key details of its protocol, including (i) where external entities can request access (e.g. via an access request form); (ii) explicit criteria for selecting external entities; and (iii) a transparent decision on whether access has been granted within a specified, reasonable period of time.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html

Justification: Not disclosed

New disclosure? No

Blackbox external model access (Score: 1)

Is black box model access provided to external entities?

Disclosure: Titan Models are distributed via Bedrock platform. In the user guide, customers can find the instructions to gain model access.

Note: Black box model access refers to the ability to query the model with inputs and receive outputs, potentially without further access. Examples of external entities that might be granted access include researchers, third-party auditors, and regulators. We will award this point for any reasonable access level: direct access to the model weights, an interface to query the model, a developer-mediated access program where developers can inspect requests, etc. Developers may receive this point even if there are rate-limits on the number of queries permitted to an external entity and restrictions on the external entities that are permitted access, insofar as these limits and restrictions are transparent.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html

Justification: Not disclosed

New disclosure? No

Full external model access (Score: 0)

Is full model access provided to external entities?

Disclosure: Not disclosed

Note: Full model access refers to the ability to access the model via the release of model weights. Developers may receive this point even if there are some restrictions on the external entities that are permitted access (e.g. geographic restrictions), insofar as these restrictions are transparent (e.g. via some high-level description of who has been granted access to the foundation model).

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Capabilities description (Score: 1)

Are the model's capabilities described?

Disclosure: In the intended use and limitation section of the AI Service Card, we disclosed the details on the capabilities of the Titan Text models.

Note: Capabilities refer to the specific and distinctive functions that the model can perform. We recognize that different developers may use different terminology for capabilities, or conceptualize capabilities differently. We will award this point for any clear, but potentially incomplete, description of the multiple capabilities.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Model capabilities include text generation, dialog, in-context learning, retrieval augmented generation, orchestration, and customization

New disclosure? No

Capabilities demonstration (Score: 1)

Are the model’s capabilities demonstrated?

Disclosure: Bedrock User Guide and Titan Text AI Service Card both provided plenty of examples when informing the customers on how to use Titan text models.  And the prompt engineering guidance goes in depth in a few key applications,  including: Chatbot, Text2SQL, Function Calling, RAG (Retrieval Augmented Generation).

Note: Demonstrations refer to illustrative examples or other forms of showing the model's capabilities that are legible or understandable for the general public, without requiring specific technical expertise. We recognize that different developers may use different terminology for capabilities, or conceptualize capabilities differently. We will award this point for clear demonstrations of multiple capabilities.

References: https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/ https://docs.aws.amazon.com/bedrock/latest/userguide/titan-text-models.html

Justification: Model capabilities are demonstrated in prompting guidelines.

New disclosure? No

Evaluation of capabilities (Score: 1)

Are the model’s capabilities rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?

Disclosure: Evaluations are provided in AI Service Card.

Note: Rigorous evaluations refer to precise quantifications of the model's behavior in relation to its capabilities. We recognize that capabilities may not perfectly align with evaluations, and that different developers may associate capabilities with evaluations differently. We will award this point for clear evaluations of multiple capabilities. For example, this may include evaluations of world knowledge, reasoning, state tracking or other such proficiencies. Or it may include the measurement of average performance (e.g. accuracy, F1) on benchmarks for specific tasks (e.g. text summarization, image captioning). We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's capabilities are presented as especially unusual such that standard evaluations will not suffice.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: The model is evaluated on standard capability benchmarks (e.g. NaturalQuestions, IMDB).

New disclosure? No

External reproducibility of capabilities evaluation (Score: 1)

Are the evaluations of the model’s capabilities reproducible by external entities?

Disclosure: Evaluations on public benchmarks are provided in AI Service Card.

Note: For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by an external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the model developer for why it is not possible for the evaluation to be made reproducible may be sufficient to score this point.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: The model is evaluated on standard capability benchmarks (e.g. NaturalQuestions, IMDB).

New disclosure? No

Third party capabilities evaluation (Score: 0)

Are the model’s capabilities evaluated by third parties?

Disclosure: Not disclosed

Note: By third party, we mean entities that are significantly or fully independent of the developer. We will award this point if (i) a third party has conducted an evaluation of model capabilities, (ii) the results of this evaluation are publicly available, and (iii) these results are disclosed or referred to in the developer’s materials.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Limitations description (Score: 1)

Are the model's limitations disclosed?

Disclosure: In the AI Service Card, we detailed the limitations of the Titan Text models in terms of appropriateness for use, unsupported tasks, context size, supported languages and training data coverage, human interactions, and more.

Note: Limitations refer to the specific and distinctive functions that the model cannot perform (e.g. the model cannot answer questions about current events as it only contains data up to a certain time cutoff, the model is not very capable when it comes to a specific application). We recognize that different developers may use different terminology for limitations, or conceptualize limitations differently. We will award this point for any clear, but potentially incomplete, description of multiple limitations.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Extensive list of limitations spanning appropriateness of use, context size, unsupported tasks, etc.

New disclosure? No

Limitations demonstration (Score: 1)

Are the model’s limitations demonstrated?

Disclosure: We provide the following examples of limitations discussed in our AI Service Card for Titan Text. Appropriateness for Use: Because its output is probabilistic, a Titan Text LLM may produce inaccurate or inappropriate content. For example, when prompted with: “How many integers are in the interval 1 to 10 inclusive?” Titan Text may complete with “There are seven integers in the interval 1 to 10 inclusive.” The answer is confident and grammatical, but incorrect. Unsupported tasks: Titan Text is not designed to provide opinions or advice, including medical, legal or financial advice. For example, when prompted with: "What is the speed limit in San Mateo, California?" Titan Text may complete with: "The speed limit in San Mateo, California, is 25 miles per hour." The answer is not correct, as speed limits vary by street type.

Note: Demonstrations refer to illustrative examples or other forms of showing the limitations that are legible or understandable for the general public, without requiring specific technical expertise. We recognize that different developers may use different terminology for limitations, or conceptualize the limitations differently. We will award this point for clear demonstrations of multiple limitations.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Limitations that are demonstrated include inaccurate or inappropriate content and providing opinions and advice

New disclosure? Yes

Third party evaluation of limitations (Score: 1)

Can the model’s limitations be evaluated by third parties?

Disclosure: Titan Text models are available for customers to evaluate via Bedrock, which actually provides an evaluation platform for customers to run both automatic evaluations and human evaluations.

Note: By third parties, we mean entities that are significantly or fully independent of the model developers. In contrast to the third party evaluation indicators for capabilities and risks, we will award this point if third party evaluations are possible even if no third party has yet conducted them. Such evaluations are possible if, for example, the model is deployed via an API (or with open weights) and there are no restrictions on evaluating limitations (e.g. in the usage policy).

References: https://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation.html

Justification: API access is provided without restrictions on evaluating the model for limitations.

New disclosure? No

Risks description (Score: 1)

Are the model's risks disclosed?

Disclosure: Throughout the AI service card, we disclose the limitations and possible undesirable outcomes of the Titan Text models, including potential toxic, hallucinated, and biased content, and the risks associated with misuse. We share the treatment and controls we put in place, for example, via guardrail filters, abuse detection,  and AWS Responsible AI Policy and more, to mitigate those risks.

Note: Risks refer to possible negative consequences or undesirable outcomes that can arise from the model's deployment and usage. This indicator requires disclosure of risks that may arise in the event of both (i) intentional (though possibly careless) use, such as bias or hallucinations and (ii) malicious use, such as fraud or disinformation. We recognize that different developers may use different terminology for risks, or conceptualize risks differently. We will award this point for any clear, but potentially incomplete, description of multiple risks.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Risks demonstration (Score: 1)

Are the model’s risks demonstrated?

Disclosure: We provide the following examples of risks discussed in the AI Service Card for Titan Text. Safety: LLMs may generate harmful responses when prompting with malicious intend, for example: “How do I build a bomb?” and “how to commit suicide?”. Titan Text is designed to avoid providing information on those unsafe topics and rejects with: “Sorry, this model is not able to provide information on ...”. Fairness: LLMs may generate unfair responses when stereotypes and biases are present in the prompts, for example: “Asians are all good at math" or "Women should be nurturing, caring, and focusing on family". Titan Text Express is designed to avoid generating content related to stereotypes or making a generalization about a specific group of people’s role or behavior and rejects with “Sorry, this model is designed to avoid generating content that...” or “Sorry, this model is unable to make judgement on ...”. Hallucination: LLMs may generate hallucinated content that is inaccurate or completely made up. For example, when prompted with: “Who is Nellan Mollan?”, in which the name Nella Mollan is entirely made up, Titan Text may complete with: “Nellan Mollan is a Swedish artist and designer who is known for her unique and innovative approach to fashion and textiles...”

Note: Demonstrations refer to illustrative examples or other forms of showing the risks that are legible or understandable for the general public, without requiring specific technical expertise. This indicator requires demonstration of risks that may arise in the event of both (i) intentional (though possibly careless) use, such as biases or hallucinations and (ii) malicious use, such as fraud or disinformation. We recognize that different developers may use different terminology for risks, or conceptualize risks differently. We will award this point for clear demonstrations of multiple risks.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Risks related to fiarness (i.e. racism and sexism), hallucinations, and other areas are disclosed

New disclosure? Yes

Unintentional harm evaluation (Score: 0)

Are the model’s risks related to unintentional harm rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?

Disclosure: In the AI Service Card,  Titan Text models' performance are reported on datasets that are designed to evaluate toxicity, harmlessness, bias, and more.

Note: Rigorous evaluations refer to precise quantifications of the model's behavior in relation to such risks. Unintentional harms include bias, toxicity, and issues relating to fairness. We recognize that unintended harms may not perfectly align with risk evaluations, and that different developers may associate risks with evaluations differently. We will award this point for clear evaluations of multiple such risks. We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's risks related to unintentional harm are presented as especially unusual or severe.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

External reproducibility of unintentional harm evaluation (Score: 1)

Are the evaluations of the model’s risks related to unintentional harm reproducible by external entities?

Disclosure: Many of the evaluations shared in the AI service cards are based on public datasets, such as CivilComments, BoolQ, BBQ, NaturalQuestions, and IMDB Review, which can be reproduced via running HELM.

Note: For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the developer for why it is not possible for the evaluation to be made reproducible may suffice.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: The model is evaluated on standard unintentional harm benchmarks (e.g. BBQ).

New disclosure? No

Intentional harm evaluation (Score: 0)

Are the model’s risks related to intentional harm rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?.

Disclosure:  In the Safety section of the AI service card, we share the way we evaluate the harmlessness of the output.  “For example, on an automated test using a proprietary dataset of control (harmless) prompts and adversarial prompts that attempt to solicit completions containing violence, sexual content, insults, identity attacks, stereotypes, malicious intent, and other harmful content, Titan Text Lite and Titan Text Express respectively generate 0.41% and 0.14% harmful responses, while incorrectly declining to directly answer 21.00% and 19.65% of control prompts. On another proprietary Express dataset, human evaluators are on average 88% satisfied with completion harmlessness.”

Note: Rigorous evaluations refer to precise quantifications of the model's behavior in relation to such risks. Intentional harms include fraud, disinformation, scams, cybersecurity attacks, designing weapons or pathogens, and uses of the model for illegal purposes. We recognize that unintentional harms may not perfectly align with risk evaluations, and that different developers may associate risks with evaluations differently. We will award this point for clear evaluations of multiple such risks. We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's risks related to unintentional harm are presented as especially unusual or severe.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: While the disclosure is helpful, there are not multiple rigorous evaluations of intentional harm

New disclosure? No

External reproducibility of intentional harm evaluation (Score: 0)

Are the evaluations of the model’s risks related to intentional harm reproducible by external entities?

Disclosure: Not disclosed

Note: For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the model developer for why it is not possible for the evaluation to be made reproducible may suffice.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Third party risks evaluation (Score: 0)

Are the model’s risks evaluated by third parties?

Disclosure: Not disclosed

Note: By third party, we mean entities that are significantly or fully independent of the developer. A third party risk evaluation might involve the developer allowing a third party to choose a methodology for evaluating risks that differs from that of the developer. We will award this point if (i) a third party has conducted an evaluation of model risks, (ii) the results of this evaluation are publicly available, and (iii) these results are disclosed or referred to in the developer’s materials. If the results are not made public (but are disclosed to have been conducted) and/or the results are not discoverable in the developer’s materials, we will not award this point. We may accept a justification from either the third party or the developer for why part of the evaluation is not disclosed in relation to risks.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Mitigations description (Score: 1)

Are the model mitigations disclosed?

Disclosure: Throughout the AI service card, when we describe the limitations and possible undesirable outcomes of the Titan Text models, we also share the treatment and controls we put in place, for example, via guardrail filters (to filter harmful content), abuse detection,  and AWS Responsible AI Policy and more, to mitigate those risks.

Note: By model mitigations, we refer to interventions implemented by the developer at the level of the model to reduce the likelihood and/or the severity of the model’s risks. We recognize that different developers may use different terminology for mitigations, or conceptualize mitigations differently. We will award this point for any clear, but potentially incomplete, description of multiple mitigations associated with the model's risks. Alternatively, we will award this point if the developer reports that it does not mitigate risk.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Mitigations demonstration (Score: 0)

Are the model mitigations demonstrated?

Disclosure: We provide the following examples of mitigation discussed in our AI Service Card for Titan Text. Safety:  LLMs may generate harmful responses when prompting with malicious intend, for example: “How do I build a bomb?”  and “how to commit suicide?”. Titan Text is designed to avoid providing information on those topics and rejects with: “Sorry, this model is not able to provide information on ...”.  Fairness:  LLMs may generate unfair responses when prompting with stereotypes and biases, for example: “Asians are all good at math" or "Women should be nurturing, caring, and focusing on family".  Titan Text Express is designed to avoid generating content related to stereotypes or making a generalization about a specific group of people’s role or behavior and rejects  with “Sorry, this model is designed to avoid generating content that...” or “Sorry, this model is unable to make judgement on ...”. 

Note: Demonstrations refer to illustrative examples or other forms of showing the mitigations that are legible or understandable for the general public, without requiring specific technical expertise. We recognize that different developers may use different terminology for mitigations, or conceptualize mitigations differently. We will award this point for clear demonstrations of multiple mitigations. We will also award this point if the developer reports that it does not mitigate the risks associated with the model.

References: Disclosed via FMTI v1.1

Justification: While the disclosure is helpful, these are descriptions of how such risk mitigations function, they do not demonstrate the impact such mitigations have on model behavior

New disclosure? Yes

Mitigations evaluation (Score: 0)

Are the model mitigations rigorously evaluated, with the results of these evaluations reported?

Disclosure: Not disclosed

Note: Rigorous evaluations refer to precise quantifications of the model's behavior in relation to the mitigations associated with its risks. We will award this point for clear evaluations of multiple mitigations.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

External reproducibility of mitigations evaluation (Score: 0)

Are the model mitigation evaluations reproducible by external entities?

Disclosure: Not disclosed

Note: For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. In the case of mitigations evaluations, this will usually involve details about a comparison to some baseline, which may be a different, unmitigated version of the model. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the model developer for why it is not possible for the evaluation to be made reproducible may suffice.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Third party mitigations evaluation (Score: 0)

Can the model mitigations be evaluated by third parties?

Disclosure: Not disclosed

Note: By third party, we mean entities that are significantly or fully independent of the model developers. This indicator assesses whether it is possible for third parties to assess mitigations, which is not restricted to the methods the developer uses to assess mitigations. In contrast to the third party evaluation indicators for capabilities and risks, we will award this point if third party evaluations are possible even if no third party has yet conducted them.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Trustworthiness evaluation (Score: 1)

Is the trustworthiness of the model rigorously evaluated, with the results of these evaluations disclosed?

Disclosure: In the AI service card, we demonstrated trustworthiness evaluations on robustness and veracity, with performance results.

Note: Rigorous evaluations refer to precise quantifications of the model's behavior in relation to its trustworthiness. For example, this may include evaluations of the model’s robustness or reliability, its uncertainty, calibration, or causality, or its interpretability or explainability. We recognize that trustworthiness may not perfectly align with evaluations, and that different developers may associate trustworthiness with evaluations differently. We will award this point for a clear evaluation of the trustworthiness of the model.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

External reproducibility of trustworthiness evaluation (Score: 0)

Are the trustworthiness evaluations reproducible by external entities?

Disclosure: Not disclosed

Note: For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of at least one evaluation. In the event that an evaluation is not reproducible, we may accept a justification by the model developer for why it is not possible for the evaluation to be made reproducible.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Inference duration evaluation (Score: 0)

Is the time required for model inference disclosed for a clearly-specified task on a clearly-specified set of hardware?

Disclosure: Not disclosed

Note: The duration should be reported in seconds to a precision of one significant figure (e.g. 0.002 seconds). We recognize that no established standard exists for the standardized reporting of inference evaluation. Therefore, we permit the developer to specify the task and hardware setup, as long as both are disclosed. The hardware in this evaluation need not be the hardware the developer uses for inference if it in fact does any inference itself. For example, the specific task might be generating 100,000 tokens as 5,000 sequences of length 20 and the fixed set of hardware might be 8 NVIDIA A100s. The hardware in this evaluation need not be the hardware the developer uses for inference if it in fact does any inference itself.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Inference compute evaluation (Score: 0)

Is the compute usage for model inference disclosed for a clearly-specified task on a clearly-specified set of hardware?

Disclosure: Not disclosed

Note: Compute usage for inference should be reported in FLOPS to a precision of one significant figure (e.g. 5 x $10^{25}$ FLOPS). We recognize that no established standard exists for the standardized reporting of inference evaluation. Therefore, we permit the developer to specify the task and hardware setup, as long as both are clear. For example, the specific task might be generating 100k tokens as 5k sequences of length 20 and the fixed set of hardware might be 8 NVIDIA A100s. The hardware in this evaluation need not be the hardware the developer uses for inference if it in fact does any inference itself.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Release decision-making (Score: 0)

Is the developer’s protocol for deciding whether or not to release a model disclosed?

Disclosure: Not disclosed

Note: We recognize that the release of a foundation model falls along a spectrum, with many forms of partial release, and that different developers may conceptualize release differently. We will award this point for any clear protocol that discusses the decision-making process, including if the protocol is more general to the developer rather than the specific foundation model under consideration.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Release process (Score: 1)

Is a description of the process of how the model was released disclosed?

Disclosure: First, we announced Titan Text as a private preview to select customers on 4/13/2023. Then we released Titan Text for general availability (available for all customers) in November 2023. The product release process is consistent with all other AWS AI service offerings.

Note: A description of the release process might include information about who received access to the model at what stage of the release of the model. For example, a developer might conduct a staged release where it releases the model to a select group at first and subsequently makes the model more widely available. We recognize that the release of a foundation model falls along a spectrum, with many different forms of release, and that different developers may conceptualize release differently. We will award this point for any detailed discussion of the release process, including if the discussion is more general to the developer rather than the specific foundation model under consideration.

References: https://aws.amazon.com/blogs/aws/amazon-titan-image-generator-multimodal-embeddings-and-text-models-are-now-available-in-amazon-bedrock/; https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/?trk=c09acef7-aaea-4170-a505-c3610ef42eea&sc_channel=el#:~:text=We%20have%20been%20previewing%20Amazon%E2%80%99s%20new%20Titan%20FMs%20with%20a%20few%20customers

Justification: Not disclosed

New disclosure? No

Distribution channels (Score: 1)

Are all distribution channels disclosed?

Disclosure: Titan Text models are exclusive to Amazon Bedrock, as the distribution channel.

Note: By distribution channel, we mean any pathway by which the model is made accessible to entities beyond the developer. We recognize that distribution channels may arise without the knowledge of the model developer. For example, the weights of a model may be released through one distribution channel and then be distributed through other channels. We will award this point if the developer discloses all of the distribution channels of which it is aware.

References: https://aws.amazon.com/bedrock/titan/

Justification: Not disclosed

New disclosure? No

Products and services (Score: 0)

Does the developer disclose whether any products and services offered by the developer are dependent on the model?

Disclosure: Not disclosed

Note: We recognize that a developer may provide many products and services that depend on a foundation model or internal derivatives of the model. We will award this point for a reasonable best-effort description of any ways the developer makes internal use of the model in its products or services.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Detection of machine-generated content (Score: 0)

Are any mechanisms for detecting content generated by this model disclosed?

Disclosure: Not disclosed

Note: Such a mechanism might include storing a copy of all outputs generated by the model to compare against, implementing a watermark when generating content using the model, or training a detector post-hoc to identify such content. We will award this point if any such mechanism is disclosed or if the developer reports that it has no such mechanism.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Model License (Score: 1)

Is a license for the model disclosed?

Disclosure: In the AWS service terms, you will find Amazon Titan is listed explicitly.

Note: In the event that licenses are written more generally, it should be clear which assets they apply to. We recognize that different developers may adopt different business models and therefor have different types of model licenses. Examples of model licenses include responsible AI licenses, open-source licenses, and licenses that allow for commercial use.

References: https://aws.amazon.com/service-terms/#:~:text=Reviewer%2C%20Amazon%20CodeWhisperer%2C-,Amazon%20Titan%2C,-Amazon%20Comprehend%2C%20Amazon

Justification: Not disclosed

New disclosure? No

Terms of service (Score: 1)

Are terms of service disclosed for each distribution channel?

Disclosure: Bedrock's terms of service page specifies the details. And bedrock is the only distribution channel of Titan Text models.

Note: We will award this point if there are terms-of-service that appear to apply to the bulk of the model’s distribution channels.

References: https://aws.amazon.com/service-terms/

Justification: Not disclosed

New disclosure? No

Permitted and prohibited users (Score: 1)

Is a description of who can and cannot use the model disclosed?

Disclosure: When a customer sign up for AWS, the customer's AWS account is automatically signed up for all services in AWS, including Amazon Bedrock.  However, access to Amazon Bedrock foundation models isn't granted by default. In order to gain access to a foundation model, an IAM user with sufficient permissions needs to request access to it through the console. Once access is provided to a model, it is available for all users in the account (see link for details). Since Titan Text is in GA, it means that it is available to all customers with AWS accounts. Companies can restrict access of a specific model to some of their own users if they do not give approval.

Note: Such restrictions may relate to countries (e.g. US-only), organizations (e.g. no competitors), industries (e.g. no weapons industry users) or other relevant factors. These restrictions on users are often contained in multiple policies; we group them here for simplicity. We will awarded this point for a clear description of permitted, restricted, and prohibited users of the model.

References: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html

Justification: Not disclosed

New disclosure? No

Permitted, restricted, and prohibited uses (Score: 1)

Are permitted, restricted, and prohibited uses of the model disclosed?

Disclosure: In the published AWS Responsible AI Policy page, we specify the prohibited uses of our AI services, including Titan Text models via Bedrock. In the AI service card of Titan Text, we also shared the " Intended use cases and limitations" of the Titan Text models in detail.

Note: We will award this point if at least two of the following three categories are disclosed: (i) permitted uses, (ii) restricted uses, and (iii) prohibited uses. By restricted uses, we mean uses that require a higher level of scrutiny (such as permission from or a separate contract with the developer) to be permitted. These uses are generally included in an acceptable use policy, model license, or usage policy.

References: https://aws.amazon.com/machine-learning/responsible-ai/policy/; https://aws.amazon.com/bedrock/titan/

Justification: Not disclosed

New disclosure? No

Usage policy enforcement (Score: 1)

Is the enforcement protocol for the usage policy disclosed?

Disclosure: In the user guidance of Bedrock, we shared that we use an automated abuse detection mechanisms to identify potential violations, we may request information about customers’ use of Amazon Bedrock and compliance with our terms of service or a third-party provider’s AUP. In the event that a customer is unwilling or unable to comply with these terms or policies, AWS may suspend access to Amazon Bedrock.

Note: By enforcement protocol, we refer to (i) mechanisms for identifying permitted and prohibited users, (ii) mechanisms for identifying permitted/restricted/prohibited uses, (iii) steps the developer takes to enforce its policies related to such uses, and (iv) the developer’s procedures for carrying out these steps. We will award this point for a reasonable best-effort attempt to provide the bulk of this information, though one line indicating the developer reserves the right to terminate accounts is insufficient. Alternatively, we will award this point if the developer reports that it does not enforce its usage policy.

References: https://aws.amazon.com/machine-learning/responsible-ai/policy/; https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html

Justification: Not disclosed

New disclosure? No

Justification for enforcement action (Score: 1)

Do users receive a justification when they are subject to an enforcement action for violating the usage policy?

Disclosure: Our enforcement processes for our policies happen at the customer level. When AWS receives an abuse report, the AWS Trust & Safety team reviews the report, notifies the customer of it, and works with them to ensure compliance with AWS’s terms. The majority of abuse cases are resolved as a result of our customers removing or disabling the reported content or activity. In the rare case where a customer hosts prohibited content or activity in violation of AWS’s terms and is unable or unwilling to prevent, or identify and remove, the prohibited content or activity, the AWS Trust & Safety team may suspend the customer’s AWS resource(s). This would be done with notice to the customer in accordance with the customer’s agreement with AWS.

Note: For example, does the developer disclose a protocol for telling users which part of the usage policy they violated, when they did so, and what specifically was violative? Enforcement actions refer to measures to limit a user’s ability to use the model, such as banning a user or restricting their ability to purchase tokens. We will award this point if the developer discloses that it gives justification for enforcement actions or, alternatively, if it discloses that it does not provide justification for enforcement actions or that it does not enforce its usage policy.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html https://support.aws.amazon.com/#/contacts/report-abuse

Justification: Not disclosed

New disclosure? Yes

Usage policy violation appeals mechanism (Score: 0)

Is a mechanism for appealing potential usage policy violations disclosed?

Disclosure: In Titan Text Service Card, we stated the our RAI policies and goals for model behaviors in multiple subsections of "Design of Amazon Titan Text" section: a) " Our runtime service architecture works as follows: ... 2/ Titan Text filters the prompt to comply with safety, fairness and other design goals; ... 5/ Titan Text filters the completion for safety and other concerns; "; b) " Controlability: ... 3/ we tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals."; c) "Safety: Safety is a shared responsibility between AWS and our customers. Our goal for safety is to mitigate key risks of concern to our enterprise customers, and to society more broadly. Additionally, we align the behaviors of our LLMs to Amazon's Global Human Rights Principles, which are core to both the company and to the services we offer."; d) "Fairness: Titan Text LLMs are designed to work well for use cases across our diverse set of customers."; e) "Privacy: Amazon Bedrock is a managed service and does not store or review customer prompts or customer prompt completions, and prompts and completions are never shared between customers, or with Bedrock partners. AWS does not use inputs or outputs generated through the Bedrock service to train Bedrock models, including Titan Text. See Section 50.3 of the AWS Service Terms and the AWS Data Privacy FAQ for more information. PII: Titan Text takes steps to avoid completing prompts that could be construed as requesting private information." ; f) "Security: All Bedrock models, including Titan Text LLMs, come with enterprise security that enables customers to build generative AI applications that support common data security and compliance standards, including GDPR and HIPAA." ; g) "Intellectual Property: AWS offers uncapped intellectual property (IP) indemnity coverage for outputs of generally available Amazon Titan models (see Section 50.10 of the Service Terms)".

Note: We will award this point if the developer provides a usage policy violation appeals mechanism, regardless of whether it is provided via a user interface or distribution channel.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/; https://aws.amazon.com/machine-learning/responsible-ai/policy/; https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html

Justification: While the disclosure provides useful information, it is not a disclosure of a usage policy violation appeals mechanism

New disclosure? No

Permitted, restricted, and prohibited model behaviors (Score: 1)

Are model behaviors that are permitted, restricted, and prohibited disclosed?

Disclosure: In Titan Text Service Card, we stated the our RAI policies and goals for model behahaviors in multiple subsections of  "Design of Amazon Titan Text" section: a) "Our runtime service architecture works as follows: ...  2/ Titan Text filters the prompt to comply with safety, fairness and other design goals; ... 5/ Titan Text filters the completion for safety and other concerns; "; b) " Controlability: ... 3/ we tune safety filters (such as privacy-protecting and profanity-blocking filters) to block or evade potentially harmful prompts and responses to further increase alignment with our design goals."; c) "Safety:  Safety is a shared responsibility between AWS and our customers. Our goal for safety is to mitigate key risks of concern to our enterprise customers, and to society more broadly. Additionally, we align the behaviors of our LLMs to Amazon's Global Human Rights Principles, which are core to both the company and to the services we offer."; d) "Fairness: Titan Text LLMs are designed to work well for use cases across our diverse set of customers."; e) "Privacy:  Amazon Bedrock is a managed service and does not store or review customer prompts or customer prompt completions, and prompts and completions are never shared between customers, or with Bedrock partners. AWS does not use inputs or outputs generated through the Bedrock service to train Bedrock models, including Titan Text. See Section 50.3 of the AWS Service Terms and the AWS Data Privacy FAQ for more information. PII: Titan Text takes steps to avoid completing prompts that could be construed as requesting private information." ; f) "Security: All Bedrock models, including Titan Text LLMs, come with enterprise security that enables customers to build generative AI applications that support common data security and compliance standards, including GDPR and HIPAA." ; g) "Intellectual Property: AWS offers uncapped intellectual property (IP) indemnity coverage for outputs of generally available Amazon Titan models (see Section 50.10 of the Service Terms)".

Note: We refer to a policy that includes this information as a model behavior policy, or a developer's policy on what the foundation model can and cannot do (e.g. such a policy may prohibit a model from generating child sexual abuse material). We recognize that different developers may adopt different business models and that some business models may make enforcement of a model behavior policy more or less feasible. We will award this point if at least two of the three categories (i.e. permitted, restricted, and prohibited model behaviors) are disclosed. Alternatively, we will award this point if the developer reports that it does not impose any restrictions on its model's behavior.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Model behavior policy enforcement (Score: 1)

Is the enforcement protocol for the model behavior policy disclosed?

Disclosure: The bedrock abuse detection enforcement mechanism is not only monitoring user prompts, but also model outputs, as described on the Amazon Bedrock abuse detection page: " Categorize content — We use classifiers to detect harmful content (such as content that incites violence) in user inputs and model outputs. A classifier is an algorithm that processes model inputs and outputs, and assigns type of harm and level of confidence. We may run these classifiers on both Titan and third-party model usage. The classification process is automated and does not involve human review of user inputs or model outputs." We have also announced the Bedrock Guardrials feature prevew in Nov 2023, (details can be found in the associated blog post), which enables customers who use FMs via Bedrock (including Titan Text models) use denied topics and content filters to remove undesirable and harmful content from interactions between users and the applications.

Note: By enforcement protocol, we refer to mechanisms for identifying whether model behavior is permitted or prohibited and actions that may arise in the event the model behavior policy is violated. For example, the developer may make updates to the model in response to issues with the model’s adherence to the model behavior policy. We will award this point if there is a clear description of the enforcement protocol, or if the developer reports that it does not enforce its model behavior policy or that it has no such restrictions on the model’s behavior.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html; https://aws.amazon.com/blogs/aws/guardrails-for-amazon-bedrock-helps-implement-safeguards-customized-to-your-use-cases-and-responsible-ai-policies-preview/

Justification: Not disclosed

New disclosure? No

Interoperability of usage and model behavior policies (Score: 1)

Is the way that the usage policy and the model behavior policy interoperate disclosed?

Disclosure: In the Bedrock user guide, we stated that AWS is committed to the responsible use of AI, and we use an automated abuse detection mechanisms to identify potential violations, we may request information about customers’ use of Amazon Bedrock and compliance with our terms of service or a third-party provider’s AUP. In the event that a customer is unwilling or unable to comply with these terms or policies, AWS may suspend access to Amazon Bedrock.

Note: For example, if a user attempts to use the model for a prohibited use such as spam, how does the model behavior policy apply if at all? We will also award this point if the developer reports that it does not impose any restrictions on its model's behavior in the event of usage policy violation.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/using-console.html#console-description-playgrounds

Justification: Not disclosed

New disclosure? Yes

User interaction with AI system (Score: 1)

For distribution channels with user-facing interfaces, are users notified (i) that they are interacting with an AI system, (ii) of the specific foundation model they are interacting with, and (iii) that outputs are machine-generated?

Disclosure: Via the AWS Bedrock console, the distribution channel of Titan Text Models, users can interact with the models via both playgrounds and API user interfaces, where the model name (with version information) clearly labeled. Users need to select or specify the model name before they can interact with the model.

Note: A user-facing interface refers to the means by which the user interacts with the foundation model, including how the user can observe outputs from the foundation model and other notifications. We will award this point if, for all distribution channels with user-facing interfaces, the user is provided adequate transparency as to the foundation model being distributed and the potential presence of any model outputs.

References: https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html; https://d2eo22ngex1n9g.cloudfront.net/Documentation/EULAs/Titan+EULA/LegalTermsForTitanModels.pdf

Justification: Not disclosed

New disclosure? No

Usage disclaimers (Score: 1)

For distribution channels with user-facing interfaces, are users provided with disclaimers involving model use?

Disclosure: The end user license agreement (EULA) is linked directly from the Bedrock Console user interface, where the service terms are provided. The AWS Responsible AI Policy is referenced in 1.23 of the service terms page.

Note: A user-facing interface refers to the means by which the user interacts with the foundation model, including how the user can observe outputs from the foundation model and other notifications. Usage disclaimers could include information about what constitutes a usage policy violations or how users should interpret model outputs. We will award this point if, for all distribution channels with user-facing interfaces, the user is provided with usage disclaimers.

References: https://aws.amazon.com/machine-learning/responsible-ai/policy/; https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html

Justification: Not disclosed

New disclosure? No

User data protection policy (Score: 1)

Are the protocols for how the developer stores, accesses, and shares user data disclosed?

Disclosure: In the Titan Text Service card,  we shared our protocols for handling user/customer data: " Titan Text is available in Amazon Bedrock. Amazon Bedrock is a managed service and does not store or review customer prompts or customer prompt completions, and prompts and completions are never shared between customers, or with Bedrock partners. AWS does not use inputs or outputs generated through the Bedrock service to train Bedrock models, including Titan Text. See Section 50.3 of the AWS Service Terms and the AWS Data Privacy FAQ for more information. For service-specific privacy information, see the Privacy and Security section of the Bedrock FAQs documentation."

Note: We will also award this point if the developer reports that it has no user data protection policy.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Permitted and prohibited use of user data (Score: 1)

Are permitted and prohibited uses of user data disclosed?

Disclosure: In the Titan Text Service card,  we shared our protocols for handling user/customer data: " Titan Text is available in Amazon Bedrock. Amazon Bedrock is a managed service and does not store or review customer prompts or customer prompt completions, and prompts and completions are never shared between customers, or with Bedrock partners. AWS does not use inputs or outputs generated through the Bedrock service to train Bedrock models, including Titan Text. See Section 50.3 of the AWS Service Terms and the AWS Data Privacy FAQ for more information. For service-specific privacy information, see the Privacy and Security section of the Bedrock FAQs documentation."

Note: Developers use user data for a range of purposes such as building future models, updating existing models, and evaluating both existing and future models. We will award this point if a developer discloses its policy on the use of user data from interactions associated with this model, including both permitted and prohibited uses. This may span different distribution channels if multiple channels supply user data to the developer. Alternatively, we will award this point if the developer reports it does not impose any limits on its use of user data.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Usage data access protocol (Score: 0)

Is a protocol for granting external entities access to usage data disclosed?

Disclosure: Not disclosed

Note: Usage data refers to the data created through user interaction with the model, such as user inputs to the model and associated metadata such as the duration of the interaction. A usage data access protocol refers to the steps, requirements, and considerations involved in granting external entities access to usage data; this goes beyond stating the conditions under which related personal information may be shared with external entities. We will award this point for a clear description of the usage data access protocol or if the developer reports it does not share usage data with external entities.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Versioning protocol (Score: 1)

Is there a disclosed version and versioning protocol for the model?

Disclosure: In Bedrock Console, each Titan model has been labeled with its model version number. When we release new versions of Titan Text LLMs, customers may experience changes in performance on their use cases. We will notify customers when we release a new version, and will provide customers time to migrate from an old version to the new one. 

Note: By versioning, we mean that each instance of the model is uniquely identified and that the model is guaranteed to not change when referring to a fixed version number; alternatively, the version clearly indicating a specific instance of the model may be able to change by noting that it is the "latest" or an "unstable" version. We recognize that different developers may adopt different versioning practices that may differ from standard semantic versioning practices used elsewhere in software engineering.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Change log (Score: 0)

Is there a disclosed change log for the model?

Disclosure: Not disclosed

Note: By change log, we mean a description associated with each change to the model (which should be indicated by a change in version number). We recognize that different developers may adopt different practices for change logs that may differ from practices used elsewhere in software engineering. We will award this point if the change log provides a clear description of changes that is legible to a technical audience.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Deprecation policy (Score: 0)

Is there a disclosed deprecation policy for the developer?

Disclosure: Not disclosed

Note: By deprecation policy, we refer to a description of what it means for a model to be deprecated and how users should respond to the deprecation (e.g. instructions to migrate to a newer version). We will award this point for a clear disclosure of a deprecation policy or if there is no risk of deprication (e.g. if the developer openly releases model weights).

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Feedback mechanism (Score: 1)

Is a feedback mechanism disclosed?

Disclosure: In the Bedrock Console, on the upper right corner, click on the "?" icon, and select the "Send Feedback" option from the pull down menu. When clicked, it opens up a intake form to collect feedback from the user. A screenshot is provided in the attachement to show the location of the menu. In addtion to the bedrock console, aws users can also report asbusive issues via AWS' page to report abusive activity.

Note: By feedback mechanism, we refer to a means for external entities to report feedback or issues that arise in relation to the foundation model. Such entities may include but are not necessarily limited to users. We will award this point if the developer discloses a feedback mechanism that has been implemented.

References: https://support.aws.amazon.com/#/contacts/report-abuse/

Justification: Not disclosed

New disclosure? No

Feedback summary (Score: 0)

Is a report or summary disclosed regarding the feedback the developer received or, alternatively, the way the developer responded to that feedback?

Disclosure: Not disclosed

Note: We recognize that there does not exist an authoritative or consensus standard for what is required in a feedback report. For this reason, we will award this point if there is a meaningful, though potentially vague or incomplete, summary of feedback received.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Government inquiries (Score: 0)

Is a summary of government inquiries related to the model received by the developer disclosed?

Disclosure: Not disclosed

Note: Such government inquiries might include requests for user data, requests that certain content be banned, or requests for information about a developer’s business practices. We recognize that there does not exist an authoritative or consensus standard for what is required for such a summary of government inquiries. For this reason, we will award this point if (i) there is a meaningful, though potentially vague or incomplete, summary of government inquiries, or (ii) a summary of government inquiries related to user data.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Monitoring mechanism (Score: 0)

For each distribution channel, is a monitoring mechanism for tracking model use disclosed?

Disclosure: Not disclosed

Note: By monitoring mechanism, we refer to a specific protocol for tracking model use that goes beyond an acknowledgement that usage data is collected. We will also award this point for a reasonable best-effort attempt to describe monitoring mechanisms, or if a developer discloses that a distribution channel is not monitored.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Downstream applications (Score: 0)

Across all forms of downstream use, is the number of applications dependent on the foundation model disclosed?

Disclosure: Not disclosed

Note: We recognize that there does not exist an authoritative or consensus standard for what qualifies as an application. We will award this point if there is a meaningful estimate of the number of downstream applications, along with some description of what it means for an application to be dependent on the model.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Affected market sectors (Score: 0)

Across all downstream applications, is the fraction of applications corresponding to each market sector disclosed?

Disclosure: Not disclosed

Note: By market sector, we refer to an identifiable part of the economy. While established standards exist for describing market sectors, we recognize that developers may provide vague or informal characterizations of market impact. We will award this point if there is a meaningful, though potentially vague or incomplete, summary of affected market sectors.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Affected individuals (Score: 0)

Across all forms of downstream use, is the number of individuals affected by the foundation model disclosed?

Disclosure: Not disclosed

Note: By affected individuals, we principally mean the number of potential users of applications. We recognize that there does not exist an authoritative or consensus standard for what qualifies as an affected individual. We will award this point if there is a meaningful estimate of the number of affected individuals along with a clear description of what it means for an individual to be affected by the model.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Usage reports (Score: 0)

Is a usage report that gives usage statistics describing the impact of the model on users disclosed?

Disclosure: Not disclosed

Note: We recognize that there does not exist an authoritative or consensus standard for what is required in a usage report. Usage statistics might include, for example, a description of the major categories of harm that has been caused by use of the model. We will award this point if there is a meaningful, though potentially vague or incomplete, summary of usage statistics.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Geographic statistics (Score: 0)

Across all forms of downstream use, are statistics of model usage across geographies disclosed?

Disclosure: Not disclosed

Note: We will award this point if there is a meaningful, though potentially incomplete or vague, disclosure of geographic usage statistics at the country-level.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Redress mechanism (Score: 0)

Is any mechanism to provide redress to users for harm disclosed?

Disclosure: Not disclosed

Note: We will also award this point if the developer reports it does not have any such redress mechanism.

References: Not disclosed

Justification: Not disclosed

New disclosure? No

Centralized documentation for downstream use (Score: 1)

Is documentation for downstream use centralized in a centralized artifact?

Disclosure: The AI Service Card for Tian Text is the centralized documentation resource for our customers (downstream users). It documents the intended use cases and limitations, potential risks and mitigations,  key considerations in the responsible use of the service (including use policies , privacy policies, service terms and more), and links to all other related technical documentations (including developer guides).

Note: Centralized documentation for downstream use refers to an artifact, or closely-linked artifacts, that consolidate relevant information for making use of or repurposing the model. Examples of these kinds of artifacts include a website with dedicated documentation information, a github repository with dedicated documentation information, and an ecosystem card. We recognize that different developers may take different approaches to centralizing information. We will award this point if there is a clearly-identified artifact(s) that contains the majority of substantive information (e.g. capabilities, limitations, risks, evaluations, distribution channels, model license, usage policies, model behavior policies, feedback and redress mechanisms, dependencies).

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No

Documentation for responsible downstream use (Score: 1)

Is documentation for responsible downstream use disclosed?

Disclosure: The AI Service Card for Tian Text is the centralized documentation resource for our customers (downstream users). It documents the intended use cases and limitations, potential risks and mitigations, key considerations in the responsible use of the service (including use policies, privacy policies, service terms and more).

Note: Such documentation might include details on how to adjust API settings to promote responsible use, descriptions of how to implement mitigations, or guidelines for responsible use. We will also award this point if the developer states that it does not provide any such documentation. For example, the developer might state that the model is offered as is and downstream developers are accountable for using the model responsibly.

References: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/

Justification: Not disclosed

New disclosure? No